Goto

Collaborating Authors

 air resistance


A highly maneuverable flying squirrel drone with agility-improving foldable wings

Lee, Dohyeon, Kang, Jun-Gill, Han, Soohee

arXiv.org Artificial Intelligence

These physical constraints cannot be fully addressed through advancements in control algorithms alone. Drawing inspiration from the winged flying squirrel, this paper proposes a highly maneuverable drone with agility-enhancing foldable wings. The additional air resistance generated by appropriately deploying these wings significantly improves the tracking performance of the proposed "flying squirrel" drone. By leveraging collaborative control between the conventional propeller system and the foldable wings--coordinated through the Thrust-Wing Coordination Control (TWCC) framework--the controllable acceleration set is expanded, allowing for the production of abrupt vertical forces unachievable with traditional wingless drones. The complex aerodynamics of the foldable wings are captured using a physics-assisted recurrent neural network (paRNN), which calibrates the angle of attack (AOA) to align with the real-world aerodynamic behavior of the wings. The model is trained on real-world flight data and incorporates flat-plate aerodynamic principles. Experimental results demonstrate that the proposed flying squirrel drone achieves a 13.1% improvement in tracking performance, as measured by root mean square error (RMSE), compared to a conventional wingless drone. A demonstration video is available on Y ouT ube: https://youtu.be/O8nrip18azY .


LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models

Tu, Shangqing, Wang, Yucheng, Zhang-Li, Daniel, Bai, Yushi, Yu, Jifan, Wu, Yuhao, Hou, Lei, Liu, Huiqin, Liu, Zhiyuan, Xu, Bin, Li, Juanzi

arXiv.org Artificial Intelligence

Existing Large Vision-Language Models (LVLMs) can process inputs with context lengths up to 128k visual and text tokens, yet they struggle to generate coherent outputs beyond 1,000 words. We find that the primary limitation is the absence of long output examples during supervised fine-tuning (SFT). To tackle this issue, we introduce LongWriter-V-22k, a SFT dataset comprising 22,158 examples, each with multiple input images, an instruction, and corresponding outputs ranging from 0 to 10,000 words. Moreover, to achieve long outputs that maintain high-fidelity to the input images, we employ Direct Preference Optimization (DPO) to the SFT model. Given the high cost of collecting human feedback for lengthy outputs (e.g., 3,000 words), we propose IterDPO, which breaks long outputs into segments and uses iterative corrections to form preference pairs with the original outputs. Additionally, we develop MMLongBench-Write, a benchmark featuring six tasks to evaluate the long-generation capabilities of VLMs. Our 7B parameter model, trained with LongWriter-V-22k and IterDPO, achieves impressive performance on this benchmark, outperforming larger proprietary models like GPT-4o. Code and data: https://github.com/THU-KEG/LongWriter-V


Interpretable Representation Learning from Videos using Nonlinear Priors

Longa, Marian, Henriques, João F.

arXiv.org Artificial Intelligence

Learning interpretable representations of visual data is an important challenge, to make machines' decisions understandable to humans and to improve generalisation outside of the training distribution. To this end, we propose a deep learning framework where one can specify nonlinear priors for videos (e.g. of Newtonian physics) that allow the model to learn interpretable latent variables and use these to generate videos of hypothetical scenarios not observed at training time. We do this by extending the Variational Auto-Encoder (VAE) prior from a simple isotropic Gaussian to an arbitrary nonlinear temporal Additive Noise Model (ANM), which can describe a large number of processes (e.g. Newtonian physics). We propose a novel linearization method that constructs a Gaussian Mixture Model (GMM) approximating the prior, and derive a numerically stable Monte Carlo estimate of the KL divergence between the posterior and prior GMMs. We validate the method on different real-world physics videos including a pendulum, a mass on a spring, a falling object and a pulsar (rotating neutron star). We specify a physical prior for each experiment and show that the correct variables are learned. Once a model is trained, we intervene on it to change different physical variables (such as oscillation amplitude or adding air drag) to generate physically correct videos of hypothetical scenarios that were not observed previously.


Learning and Optimization with Bayesian Hybrid Models

Eugene, Elvis A., Gao, Xian, Dowling, Alexander W.

arXiv.org Machine Learning

Bayesian hybrid models fuse physics-based insights with machine learning constructs to correct for systematic bias. In this paper, we compare Bayesian hybrid models against physics-based glass-box and Gaussian process black-box surrogate models. We consider ballistic firing as an illustrative case study for a Bayesian decision-making workflow. First, Bayesian calibration is performed to estimate model parameters. We then use the posterior distribution from Bayesian analysis to compute optimal firing conditions to hit a target via a single-stage stochastic program. The case study demonstrates the ability of Bayesian hybrid models to overcome systematic bias from missing physics with less data than the pure machine learning approach. Ultimately, we argue Bayesian hybrid models are an emerging paradigm for data-informed decision-making under parametric and epistemic uncertainty.


The Physics of Building Jumps in 'The Matrix'

WIRED

You haven't seen The Matrix? Well, you should watch it. Here's the basic idea--some dude (Neo) finds out he's been living in a computer program. Since his world isn't "real," he is able to do some superhuman things--like dodge bullets and jump from one building to the next. Yes, this building jump is what I want to look at.


Here's How Fast That Jumping Tesla Was Traveling

WIRED

One of my part-time jobs is as an internet investigator. When crazy things happen, people want to know more about that crazy thing. In this case, the crazy thing is a Telsa driving super fast over a railroad crossing. It's going so fast that the car gets airborne before eventually losing control. Fortunately, it doesn't seem like anyone was seriously injured, and it is also fortunate that a security camera caught this motion on video. Normally when I need to find the velocity of an object in a video, I just use my typical video analysis techniques in which I mark the position of the object in each frame.